feat: add control to enable customized mla operation. #324

Hermit-w · 2025-11-06T06:14:36Z

xLLM启动参数：

export PYTHON_INCLUDE_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["include"])')"
export PYTHON_LIB_PATH="$(python3 -c 'from sysconfig import get_paths; print(get_paths()["include"])')"
export PYTORCH_NPU_INSTALL_PATH=/usr/local/libtorch_npu/
export PYTORCH_INSTALL_PATH="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"
export LIBTORCH_ROOT="$(python3 -c 'import torch, os; print(os.path.dirname(os.path.abspath(torch.__file__)))')"
export LD_LIBRARY_PATH=/usr/local/libtorch_npu/lib:$LD_LIBRARY_PATH

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
# export ASCEND_RT_VISIBLE_DEVICES=0
#export ASCEND_RT_VISIBLE_DEVICES=4,5
export ASDOPS_LOG_TO_STDOUT=1
export ASDOPS_LOG_LEVEL=ERROR
export ATB_LOG_TO_STDOUT=1
# export ASDOPS_LOG_TO_FILE=1 
# export HCCL_BUFFSIZE=1024
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export NPU_MEMORY_FRACTION=0.98
export ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=3
export ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1

export OMP_NUM_THREADS=12

export HCCL_CONNECT_TIMEOUT=7200

\rm -rf /root/atb/log/
\rm -rf /root/ascend/log/
\rm -rf core.*

MODEL_PATH="/export/home/lanliwei.1/models/models/DeepSeek-V3"
MASTER_NODE_ADDR="11.87.49.111:9590"
START_PORT=18999
START_DEVICE=0
LOG_DIR="log"
NNODES=16
WORLD_SIZE=16

export HCCL_IF_BASE_PORT=43439


for (( i=0; i<$NNODES; i++ ))
do
  PORT=$((START_PORT + i))
  DEVICE=$((START_DEVICE + i))
  LOG_FILE="$LOG_DIR/node_$i.log"
    /export/home/lanliwei.1/code/mla_xllm_customize/xllm/build/xllm/core/server/xllm \
    --model $MODEL_PATH \
    --port $PORT \
    --devices="npu:$DEVICE" \
    --master_node_addr=$MASTER_NODE_ADDR \
    --nnodes=$WORLD_SIZE \
    --node_rank=$i \
    --max_memory_utilization=0.8 \
    --max_tokens_per_batch=20000 \
    --max_seqs_per_batch=2000 \
    --block_size=128 \
    --enable_prefix_cache=false \
    --enable_chunked_prefill=false \
    --communication_backend="hccl" \
    --enable_schedule_overlap=true \
    --enable_mla=true \
    --ep_size=16 \
    --dp_size=4 \
    --enable_customize_mla_kernel \
    > $LOG_FILE 2>&1 &
done

使用如下脚本测试
benchmark.py

测试参数：

python3 benchmark.py \
 --backend xllm \
 --model /export/home/lanliwei.1/models/models/DeepSeek-V3 \
 --dataset-name random \
 --random-range-ratio 1 \
 --num-prompt 420 \
 --request-rate 2 \
 --max-concurrency 100 \
 --random-input 2048 \
 --random-output 2048 \
 --host 127.0.0.1 \
 --port 18999 \
 --dataset-path /export/home/lanliwei.1/dataset/ShareGPT_V3_unfiltered_cleaned_split.json \

性能对比：

liutongxuan · 2025-11-06T07:26:48Z

xllm/core/runtime/llm_engine.cpp

+        [](const std::vector<RawForwardInput>& inputs) {
+          return std::all_of(
+              inputs.begin(), inputs.end(), [](const RawForwardInput& input) {
+                return input.flatten_tokens_vec.size() < 230;


This magic number needs to be defined separately as constexpr, and the name should indicate what it is.

A constant variable has been added with some annotations.

LMX-xin · 2025-11-07T08:35:28Z

LGTM

Hermit-w requested review from LMX-xin, liutongxuan and yq33victor November 6, 2025 06:14

liutongxuan reviewed Nov 6, 2025

View reviewed changes

Hermit-w force-pushed the feat/mla branch from a9c76ae to a203955 Compare November 6, 2025 07:40

feat: add control to enable customized mla operation.

a19e0c1

Hermit-w force-pushed the feat/mla branch from a203955 to a19e0c1 Compare November 6, 2025 10:25

Hermit-w requested a review from liutongxuan November 7, 2025 08:19

LMX-xin approved these changes Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add control to enable customized mla operation. #324

feat: add control to enable customized mla operation. #324

Hermit-w commented Nov 6, 2025 •

edited

Loading

Uh oh!

liutongxuan Nov 6, 2025

Uh oh!

Hermit-w Nov 6, 2025

Uh oh!

LMX-xin commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add control to enable customized mla operation. #324

Are you sure you want to change the base?

feat: add control to enable customized mla operation. #324

Conversation

Hermit-w commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liutongxuan Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Hermit-w Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

LMX-xin commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hermit-w commented Nov 6, 2025 •

edited

Loading